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We introduce exponential search trees as a novel technique for converting static 
polynomial space search structures for ordered sets into fully-dynamic linear space 
data structures. 

This leads to an optimal bound of O ( \/log n / log log n) for searching and updating 
a dynamic set of n integer keys in linear space. Here searching an integer y means 
finding the maximum key in the set which is smaller than or equal to y. This problem 
is equivalent to the standard text book problem of maintaining an ordered set (see, 
e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT 
Press, 2001). 

The best previous deterministic linear space bound was 0(log nj log log n) due Fred- 
man and Willard from STOC 1990. No better deterministic search bound was known 
using polynomial space. 

We also get the following worst-case linear space trade-offs between the num- 
ber n, the word length W, and the maximal key U < 2 W : 0(min{log log n + 
log n I log W, log log n ■ log log fog U y ) ■ These trade-offs are, however, not likely to be 
optimal. 

Our results are generalized to finger searching and string searching, providing op- 
timal results for both in terms of n. 



*This paper combines results presented by the authors at the 37th FOCS 1996 0, the 32nd STOC 2000 
0, and the 12th SODA 2001 
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1 Introduction 



1.1 The Textbook Problem 

Maintaining a dynamic ordered set is a fundamental textbook problem. For example, fol- 
lowing Cormen, Leiserson, Rivest, Stein: Introduction to Algorithms Part III], the basic 
operations are: 

Search (S, k) Returns a pointer to an element in S with key k, or return a null pointer if x 
is not in S. 

Insert (S,x) Add x to S, here i is a pointer to an element. The data structure may 
associate information with x, e.g., parent and child pointers in binary search trees. 

Delete (S, x) Remove x from S, here a; is a pointer to an element in S. 

Predecessor/ Successor (S, x) Given that x points at an element in S, return a pointer 
to the next smaller /larger element in S (or a null pointer if no such element exists). 

Minimum/Maximum (S) Return a pointer to the smallest/largest element in S (or a null 
pointer if S is empty). 

To make the ordering total, we follow the convention that if two elements have the same key, 
the last inserted element is larger. 

For keys that can only be accessed via comparisons, all of the above operations can be 
supported in O(logn) time 1 , which is best possible. 

However, on computers, integers and floating point numbers are the most common or- 
dered data types. For such data types, represented as lexicographically ordered, we can apply 
classical non-comparison based techniques such as radix sort and hashing. Historically, we 
note that radix sort dates back at least to 1929 ^2] arid hashing dates back at least to 1956 
[To] , whereas the focus on general comparison based methods only dates back to 1959 [To] . 

In this paper, we consider the above basic data types of integers and floating point num- 
bers. Our main result is that we can support all the above operations in O ( ^/log n / log log n) 
worst-case time, and this common bound is best possible. 

The lower bound follows from a result of Beame and Fich [7j. It shows that even if we 
just want to support Insert and Predecessor in polynomial space, one of these two operations 
have a worst-case bound of f2(yiogn/ log logn), matching our common upper bound. We 
note that one can find better bounds and trade-offs for some of the individual operations. 
Indeed, we will support Min, Max, Predecessor, Successor, and Delete in constant time, and 
only do Insert and Search in 0(y / logn/ log logn) time. 

It is also worth noticing that if we just want to consider an incremental dictionary with 
Insert and Search, then our O ( ^log n / log log n) Search time is the best known with 
Insert time. 

1 We use the convention that logarithms are base 2 unless otherwise stated. Also, n is the number of 
stored elements 
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1.2 Extending the search operation 

In an ordered set, it is common to consider an extended version of Search: 

Search (S, k) Returns a pointer to an element in S with the largest key smaller than or 
equal to k, or null if k is smaller than any key in S. 

Thus, if the key is not there, we do not just return a null pointer. It is for this extended search 
that we provide our 0(\J\ogn/ log logn) upper bound. It is also for this search operation 
that Beame and Fich [Zj proved their fi(-\/logn/ log logn) lower bound, and that was even 
for a single extended search in a static set 5* for any given representation in polynomial 
space. To see that this gives a lower bound just for Insert and Predecessor, we can solve 
the static predecessor problem as follows. First we insert all the elements of S to create a 
representation of S. The lower bound of Beame and Fich does not care about the time for this 
preprocessing. To search k, we insert an element with k and ask for the predecessor of this 
element. The lower bound of Beame and Fich then asserts that the Insert and Predecessor 
operation together takes Vt(^J\ogn/ log logn) time in the worst-case, hence that at least one 
of the operations has a worst-case lower bound of £l(^\ogn/ log logn). 

In the rest of this paper, search refers to the extended version whereas the primitive 
version, returning null if the key is not there, is referred to as a look-up. 

We will always maintain a sorted doubly-linked list with the stored elements and a dis- 
tinguished head and tail. With this list we support Successor, Predecessor, Minimum, and 
Maximum in constant time. Then Insert subsumes a Search identifying the element after 
which the key is to be inserted. Similarly, if we want to delete by a key value, rather than 
by a pointer to an element, a Search, or look-up, identifies an element with the key to be 
deleted, if any. 

To isolate the search cost from the update cost, we talk about finger updates, where 
for Insert, we are given the element after which the key is to be inserted, and for Delete, 
we are given the element to be deleted. Then an update with a key value is implemented 
with a Search followed by a finger update. As we shall discuss later in Section 11.81 we will 
implement all finger updates in constant time. However, below, we mostly discuss common 
upper bounds for searching and updating. 

1.3 History 

At STOC'90, Fredman and Willard [TH] surpassed the comparison-based lower bounds for 
integer sorting and searching using the features available in a standard imperative pro- 
gramming languages such as C. Their key result was an O (logn/ log logn) time bound for 
deterministic dynamic searching in linear space. The time bounds for dynamic searching 
include both searching and updates. Fredman and Willard's dynamic searching immediately 
provided them an 0(n logn/ log logn) sorting routine. They asked the fundamental ques- 
tion: how fast can we search [integers on a RAM]? Since then, much effort has been spent 
on finding the inherent complexity of fundamental searching and sorting problems. 
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In this paper, we introduce exponential search trees as a novel technique for converting 
static polynomial space search structures for ordered sets into fully-dynamic linear space 
data structures. Based on this we get an optimal worst-case bound of O ( i/log n / log log n) 
for dynamic integer searching. We note that this also provides the best bounds for the 
simpler problem of membership and look-up queries if one wants updates in time |21| . 

Our results also extend to optimal finger search with constant finger update, and to 
optimal string searching. 

1.4 Model of computation 

Our algorithms runs on a RAM, which models what we program in imperative programming 
languages such as C. The memory is divided into addressable words of length W. Addresses 
are themselves contained in words, so W > logn. Moreover, we have a constant number of 
registers, each with capacity for one word. The basic instructions are: conditional jumps, 
direct and indirect addressing for loading and storing words in registers, and some compu- 
tational instructions, such as comparisons, addition, and multiplication, for manipulating 
words in registers. The space complexity is the maximal memory address used, and the time 
complexity is the number of instructions performed. All keys are assumed to be integers 
represented as binary strings, each fitting in one word. One important feature of the RAM 
is that it can use keys to compute addresses, as opposed to comparison-based models of 
computation. This feature is commonly used in practice, for example in bucket or radix 
sort. 

The restriction to integers is not as severe as it may seem. Floating point numbers, 
for example, are ordered correctly, simply by perceiving their bit-string representation as 
representing an integer. Another example of the power of integer ordering is fractions of two 
one-word integers. Here we get the right ordering if we carry out the division with floating 
point numbers with 2W bits of precession, and then just perceive the result as an integer. 
The above examples illustrate how integer ordering can capture many seemingly different 
orderings that we would naturally be interested in. 

The above word model is equivalent to a restriction that one only has unit cost operations 
for integers that are polynomial in n and the integers in X. The later formulation goes back 
to Kirkpatrick and Reisch in 1984 24 . We note that if we do not somehow limit the size of 
the unit-cost integers, we get NP=P unless we start ruling out common instructions such as 
multiplication and shifts. 

1.5 Historical developments 

As mentioned above, in 1990, Fredman and Willard jTH] showed that one can do dynamic 
searching in O (log n/ log log n) time. They also showed that the 0(logn/ log log n) bound 
could be replaced by an 0(y/logn) bound if we allow randomization or space unbounded in 
terms of n. Fredman and Willard original construction was amortized, but in 1992, Willard 
[3H| Lemma 3.3] showed that the update bounds can be de-amortized so as to get worst-case 
bounds for all operations. 
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In 1996, Andersson |2j introduced exponential search trees as a general technique reducing 
the problem of searching a dynamic set in linear space to the problem of creating a search 
structure for a static set in polynomial time and space. The search time for the static set 
essentially becomes the amortized search time in the dynamic set. From Fredman and 
Willard [18J, he got a static search structure with 0(y/logn) search time, and thus he 
obtained an 0(y/logn) time bound for dynamic searching in linear space. 

In 1999, Beame and Fich showed that ( ^log n / log log n) is the exact complexity of 
searching a static set using polynomial space Using the above mentioned exponential 
search trees, this gave them G ( ^/log n / log log n) amortized cost for dynamic searching in 
linear space. 

Finally, in 2000, Andersson and Thorup 5J developed a worst-case version of exponential 
search trees, giving an optimal 0(y / logn/ log logn) worst-case time bound for dynamic 
searching. This is the main result presented in this paper. 

1.6 Bounds in terms of the word length and the maximal key 

Besides the above mentioned bounds in terms of n, we get the following worst-case linear 
space trade-offs between the number n, the word length W, and the maximal key U < 2 W : 
0(min{loglogn + logn/ log W, log log n ■ lo 1 ° 1 g o 1 ° 1 ^ ) g j7 }). The last bound should be compared 
with van Emde Boas' bound of O (log log f/) [331 HH] that requires randomization (hashing) 
in order to achieve linear space [2Z1- We note that these bounds are probably not optimal. 
The best lower bound on searching in terms of U is Beame and Fich's ^( lo ^io^iogi/ ) for the 
static case. 

1.7 AC operations 

As an additional challenge, Fredman and Willard jlHj asked how quickly we can search on a 
RAM if all the computational instructions are AC operations. A computational instruction 
is an AC operation if it is computable by an '-sized constant depth circuit with 0(W) 
input and output bits. In the circuit we may have negation, and-gates, and or-gates with 
unbounded fan- in. Addition, shift, and bit- wise boolean operations are all AC operations. 
On the other hand, multiplication is not. Fredman and Willard's own techniques |15| were 
heavily based on multiplication, but, as shown in [3] they can be implemented with AC 
operations if we allow some non-standard operations that are not part of the usual instruction 
set. However, here we are interested in algorithms using only standard operations so that 
they can be implemented in a standard programming language such as C. 

Concerning searching, our 0(y/\ogn/ log logn) search structure is strongly based on mul- 
tiplication. So far, even if we allow amortization and randomization, no search structure us- 
ing standard AC operations has been presented using polynomial space and o(logn) time, 
not even for the static case. Without requirements of polynomial space, Andersson pQ has 
presented a deterministic worst-case bound of 0(y/\ogn). In this paper, we will present a 
linear space worst-case AC bound of 0((logn) 3 / 4+ ° ( - 1 * ) ), thus surpassing the O(logn) bound 
even in this restricted case. 
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1.8 Finger searching 

By finger search we mean that we can have a "finger" pointing at a stored key x when 
searching for a key y. Here a finger is just a reference returned to the user when x is inserted 
or searched for. The goal is to do better if the number q of stored keys between x and y 
is small. Also, we have finger updates, where for deletions, one has a finger on the key to 
be deleted, and for insertions, one has a finger to the key after which the new key is to be 
inserted. 

In the comparison-based model of computation Dietz and Raman have provided 
optimal bounds, supporting finger searches in O(logg) time while supporting finger updates 
in constant time. Very recently, Brodal et al. [9, have managed to match these results on a 
pointer machine. 

In this paper we present optimal bounds on the RAM; namely 0(yIog qj log logg) for 
finger search with constant time finger updates. Also, we present the first finger search 
bounds that are efficient in terms of the absolute distance \y — x\ between x and y. 

1.9 String searching 

We will also consider the case of string searching where each key may be distributed over 
multiple words. Strings are then ordered lexicographically. One may instead be interested in 
variable length multiple- word integers where integers with more words are considered larger. 
However, by prefixing each integer with its length, we reduce this case to lexicographic string 
searching. 

Generalizing search data structures for string searching is nontrivial even in the simpler 
comparison-based setting. The first efficient solution was presented by Mehlhorn §111]. 
While the classical method requires weight-balanced search structures, our approach contains 
a direct reduction to any unweighted search structure. With inspiration from EJ we 
show if the longest common prefix between a key y and the stored keys has £ words, we can 
search y in Oil + A/log n/ log log n) time, where n is the current number of keys. Updates 
can be done within the same time bound. Assuming that we can address the stored keys, 
our extra space bound is 0{n). 

The above search bound is optimal, for consider an instance of the 1-word dynamic 
search problem, and give all keys a common prefix of I words. To complete a search we 
both need to check the prefix in 0(£) time, and to perform the 1-word search, which takes 
Vt(l + A/log nj log logn) 

Note that one may think of the strings as divided into characters much smaller than 
words. However, if we only deal with one such character at the time, we are not exploiting 
the full power of the computer at hand. 

1.10 Techniques and main theorems 

Our main technical contribution is to introduce exponential search trees providing a general 
reduction from the problem of maintaining a worst-case dynamic linear spaced structure to 
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the simpler problem of constructing static search structure in polynomial time and space. 
For example, the polynomial construction time allows us to construct a dictionary determin- 
istically with look-ups in constant time. Thus we can avoid the use of randomized hashing 
in, e.g., a van Emde Boas's style data structure [331 UH HZ] • The reduction is captured by 
the following theorem: 

Theorem 1 Suppose a static search structure on d integer keys can be constructed in 
0(<i fc_1 ), k > 2, time and space so that it supports searches in S(d) time. We can then 
construct a dynamic linear space search structure that with n integer keys supports insert, 
delete, and searches in time T{n) where 

T(n) <T(n 1 - 1 / fc ) + 0(5(r2)). (1) 

The reduction itself uses only standard ACP operations. 

We then prove the following result on static data structures: 

Theorem 2 In polynomial time and space, we can construct a deterministic data structure 
over d keys supporting searches in O (min{ \/log d, log log U, 1 + ^° g ^ }) time where W is the 
word length, and U < 2 W is an upper bound on the largest key. If we restrict ourselves to 
standard ACP operations, we can support searches in 0((\ogd) 3 ^ + °^) worst-case time per 
operation. 



Above, the ylogd and log log U bounds were recently improved: 

Theorem 3 (Beame and Fich [7J) In polynomial time and space, we can 
construct a deterministic data structure over d keys supporting searches in 
OCimn^ogd/loglogd, ^gf^}) time. 

Applying the recursion from Theorem substituting S(d) with (i) the two bounds in Theo- 
rem 01 (ii) the last bound in the min-expression in Theorem and (iii) the AC bound from 
Theorem |21 we immediately get the following four bounds: 

Corollary 4 There is a fully- dynamic deterministic linear space search structure supporting 
insert, delete, and searches in worst-case time 



O 



n 



Hiin<| l°glogn- I ^|g I7 } (2) 



where W is the word length, and U < 2 W is an upper bound on the largest key. If we restrict 
ourselves to standard ACP operations, we can support all operations in 0((logn) 3 / 4+ °^)) 
worst-case time per operation. 



It follows from the lower bound by Beame and Fich [7j that our O ( ^log n / log log n) bound 
is optimal. 
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1.10.1 Finger search 

A finger search version of Theorem pleads us to the following finger search version of Corol- 
lary m 

Theorem 5 There is a fully- dynamic deterministic linear space search structure that sup- 
ports finger updates in constant time, and given a finger to a stored key x, searches a key 
y > x in time 



where q is the number of stored keys between x and y. If we restrict ourselves to ACP 



1.10.2 String searching 

We also present a general reduction from string searching to 1-word searching: 

Theorem 6 For the dynamic string searching problem, if the longest common prefix between 
a key x and the stored keys has i words, we can insert, delete, and search x in 0(£ + 
a/ log n I log log n) time, where n is the current number of keys. In addition to the stored keys 
themselves, our space bound is 0(n). 

1.11 Contents 

First, in Sectional we present a simple amortized version of exponential search trees, and 
then we de- amortize them in Section El In Section 0] we construct the static search structures 
to be used in the exponential search tree. In Section |5J we describe the data structure for 
finger searching. In Section |U1 we describe the data structure for string searching. In Section 
13 we give examples of how the techniques of this paper have been applied in other work. 
Finally, in Section |H1 we finish with an open problem. 

2 The main ideas and concepts in an amortized setting 

Before presenting our worst-case exponential search trees, we here present a simpler amor- 
tized version from [2|, converting static data structures into fully-dynamic amortized search 
structures. The basic definitions and concepts of the amortized construction will be assumed 
for the more technical worst-case construction. 

An exponential search tree is a leaf-oriented multiway search tree where the degrees of 
the nodes decrease doubly-exponentially down the tree. By leaf- oriented, we mean that all 
keys are stored in the leaves of the tree. Moreover, with each node, we store a splitter for 
navigation: if a key arrives at a node, a local search among the splitters of the children 
determines which child it belongs under. Thus, if a child v has splitter s and its successor 
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has splitter s', a key y belongs under v if x 6 [s, s'). We require that the splitter of an 
internal node equals the splitter of its leftmost child. 

We also maintain a doubly-linked list over the stored keys, providing successor and pre- 
decessor pointers as well as maximum and minimum. A search in an exponential search tree 
may bring us to the successor of the desired key, but if the found key is to big, we just return 
its predecessor. 

In our exponential search trees, the local search at each internal node is performed using 
a static local search structure, called an S-structure. We assume that an 5-structure over d 
keys can be built in 0(d k ^ 1 ) time and space and that it supports searches in S(d) time. We 
define an exponential search tree over n keys recursively: 

• The root has degree 0(n 1 / fc ). 

• The splitters of the children of the root are stored in a local S'-structure with the 
properties stated above. 

• The subtrees are exponential search trees over 

e( n i-i/fc) keys. 

It immediately follows that searches are supported in time T(n) = O (S (0(n 1//fc ))) + 
T (0(n 1-1 / fc )), which is essentially the time bound we are aiming at. 

An exponential search tree over n keys takes linear space. The space of the ^-structure 
at a node of degree d is O (rf fc_1 ), and the total space C(n) is essentially given by 



The above calculation in not complete since we only recurse to subproblems of size 0(n 1_1 / fe ), 
and then the direct solution is not linear. However, so far we are only sketching a simple 
amortized version in order to introduce the general ideas. A rigorous argument will be given 
for the real worst-case version (c.f. Lemma IT^j) . 

Since 0(d k ~ l ) bounds not only the space but also the construction time for the ^-structure 
at a degree d node, the same argument gives that we can construct an exponential search 
tree over n keys in linear time. 

Recall that an update is implemented as a search, as described above, followed by a finger 
update. A finger delete essentially just removes the leaf with the element. However, if the 
leaf is the first child of its parent, its splitter has to be transfered to its successor. For a 
finger insert of an element u with key k, we get the element v after which the key is to be 
inserted. We then also have to consider the successor v' of v. Let s be the splitter of v' . If 
k < s, we place u after v under the parent of v, and give u splitter k. If k < s, we place u 
before v' under the parent of v', and give u splitter s and v' its own key as splitter. 

Balance is maintained in a standard fashion by global and partial rebuilding. By the 
weight, \t\, of a (sub-)tree t we mean the number of leaves in t. By the weight, \v |, of a node 
v, we mean the weight of the tree rooted at v. When a subtree gets too heavy, by a factor 
of 2, we split it in two, and if it gets too light, by a factor of 2, we join it with its neighbor. 



C(n) 
C(n) 



0{n). 



i/k . c( n i-V*) 
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Constructing a new subtree rooted at the node v takes time. In addition, we need 

to update the S'-structure at v's parent v, in order to reflect the adding or removing of a 
key in w's list of child splitters. Since v has Q(\v j 1 /*) children, the construction time for v's 
S'-structure is 0((\v | 1 / fc ) fc ~ 1 ) = 0(|v| 1_1 / fc ). By definition, this time is 0(|t|). We conclude 
that we can reconstruct the subtrees and update the parent's S'-structure in time linear in 
the weight of the subtrees. 

Exceeding the weight constraints requires that a constant fraction of the keys in a subtree 
have been inserted and deleted since the subtree was constructed with proper weight. Thus, 
the reconstruction cost is an amortized constant per key inserted or deleted from a tree. 
Since the depth of an exponential search tree is O (log log n), the update cost, excluding the 
search cost for finding out were to update, is O(loglogn) amortized. This completes our 
sketchy description of amortized exponential search trees. 

3 Worst-case exponential search trees 

The goal of this section is to prove the statement of Theorem 

Suppose a static search structure on d integer keys can be constructed in 0(d k ~ l ), 
k > 2, time and space so that it supports searches in S(d) time. We can then 
construct a dynamic linear space search structure that with n integer keys supports 
insert, delete, and searches in time T{n) where T(n) < T(n 1 ~ 1 ^ k )-\-0(S(n)). The 
reduction itself uses only standard ACP operations. 

In order to get from the amortized bounds above to worst-case bounds, we need a new type 
of data structure. Instead of a data structure where we occasionally rebuild entire subtrees, 
we need a multiway tree which is something more in the style of a standard B-tree, where 
balance is maintained by locally joining and splitting nodes. By locally we mean that the 
joining and splitting is done just by joining and splitting the children sequences. This type 
of data structure is for example used by Willard [36J to obtain a worst-case version of fusion 
trees. 

One problem with our current definition of exponential search trees is that the criteria 
for when subtrees are too large or too small depend on their parents. If two subtrees are 
joined, the resulting subtree is larger, and according to our recursive definition, this may 
imply that all of the children simultaneously become too small, so they have to be joined, 
etc. To avoid such cascading effects of joins and splits, we redefine the exponential search 
tree as follows: 

Definition 7 In an exponential search tree all leaves are on the same depth, and we define 
the height or level of a node to be the unique distance from the node to the leaves descending 
from it. For a non-root node v at height i > 0, the weight (number of descending leaves) is 
\v\ = 0(rij) where = Q/( 1 + 1 /( fc_1 )) 1 and a = 0(1). If the root has height h, its weight is 
0(n h ). 
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With the exception of the root, Definition [7| follows our previous definition of exponen- 
tial search trees, that is, if v is a non-root node, it has Od^l 1 ^) children, each of weight 
eflul 1 - 1 /*). 

Our main challenge is now to rebuild ^-structures in the background so that they remain 
sufficiently updated as nodes get joined and split. In principle, this is a standard task 
(see e.g. jSZj)- Yet it is a highly delicate puzzle which is typically either not done (e.g. 
Fredman and Willard [TS] only claimed amortized bounds for their original fusion trees), 
or done with rather incomplete sketches (e.g. Willard (20] only presents a 2-page sketch of 
his de-amortization of fusion trees). Furthermore, our exponential search trees pose a new 
complication; namely that when we join or split, we have to rebuild, not only the S-structures 
of the nodes being joined or split, but also the S'-structures of their parent. For contrast, 
when Willard [SB] de-amortizes fusion trees, he actually uses the "atomic heaps" from ^H] 
as S-structures, and these atomic heaps support insert and delete in constant time. Hence, 
when nodes get joined or split, he can just delete or insert the splitter between them directly 
in the parents S-structure, without having to rebuild it. 

In this section, we will present a general quotable theorem about rebuilding, thus making 
proper de-amortization much easier for future authors. 

3.1 Join and split with postprocessing 

As mentioned, we are going to deal generally with multiway trees where joins and splits can- 
not be completed in constant time. For the moment, our trees are only described structurally 
with a children list for each non-leaf node. Then joins and splits can be done in constant 
time. However, after each join or split, we want to allow for some unspecified postprocessing 
before the involved nodes can participate in new joins and splits. This postprocessing time 
will, for example, be used to update parent pointers and S'-structures. 

The key issue is to schedule the postprocessing, possibly involving reconstruction of 
static data structures, so that we obtain good worst-case bounds. We do this by dividing 
each postprocessing into local update steps and ensuring that each update only uses a few 
local update steps at the same time as each postprocessing is given enough steps to complete. 
The schedule is independent of how these update steps are performed. 

To be more specific in our structural description of a tree, let u be the predecessor of v in 
their parent's children list C. A join of u and v means that we append the children list of v 
to that of u so that u adopts all these children from v. Also, we delete v from C. Similarly, 
a split of a node u at its child w means that we add a new node v after u in the children 
list C of u's parent, that we cut the children list of u just before w, and make the last part 
the children list of v. Structural splits and joins both take constant time and are viewed 
as atomic operations. In the postprocessing of a join, the resulting node is not allowed to 
participate in any joins or splits. In the postprocessing of a split, the resulting nodes are 
neither allowed to participate directly in a join or split, nor is the parent allowed to split 
between them. 

We are now in the position to present our general theorem on worst-case bounds for 
joining and splitting with postprocessing: 
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Theorem 8 Given a number series ni,n 2 , . . ., with n± > 84, n i+ i > 18n it we can schedule 
split and joins to maintain a multiway tree where each non-root node v on height i > has 
weight between rij/4 and n^. A root node on height h > has weight at most and at least 
2 children. The schedule gives the following properties: 

(i) When a leaf v is inserted or deleted, for each node u on the path from v to the root 
the schedule use one local update step contributing to the postprocessing of at most one join 
or split involving either u or a neighbor of u. 

(ii) For each split or join at level i the schedule ensures that we have n.j/84 local update 
steps available for postprocessing, including one at the time of the split or join. 

(Hi) If the time of a local update step on level i is bounded by ti = each update is 



In our exponential search tree, we will have ij = 0(1), but ti = uj(1) has been useful in 
connection with priority queues [5]. 

The above numbers ensure that a node which is neither root nor leaf has at least 
(n$/4)/ni_i = 18/4 > 4 children. If the root node is split, a new parent root is gener- 
ated implicitly. Conversely, if the root's children join to a single child, the root is deleted 
and the single child becomes the new root. The proof of Theorem |H1 is rather delicate, and 
deferred till later. Below we show how to apply Theorem |H1 in exponential search trees. As 
a first simple application of the schedule, we show how to maintain parents. 

Lemma 9 In Theorem^ the parent of any node can be computed in constant time. 

Proof: With each node, we maintain a parent pointer, which points to the true parent, 
except, possibly during the postprocessing of a split or join. Split and joins are handled 
equivalently. Consider the case of a join of u and v into v. During the postprocessing, we 
will redirect all the parent pointers of the old children of v to point to u. Meanwhile, we will 
have a forward pointer from v to u so that parent queries from any of these children can be 
answered in constant time, even if the child still points to v. 

Suppose that the join is on level i. Then v could not have more than children. Hence, 
if we redirect 84 of their parent pointers in each local update step, we will be done the end 
of the postprocessing of Theorem |HJ The redirections are done in a traversal of the children 
list, starting from the old first child of v. One technical detail is, however, that we may have 
join and split in the children sequence. Joins are not a problem, but for split we make the 
rule that a if we split u' into u' and v', v' inherits the parent pointer of v'. Also, we make 
the split rule that if the parent pointer of u' is to a node like v with a forward pointer to 
a node like u that v is being joined to, we redirect the next child in the above redirection 
traversal from v. This way we make sure that the traversal of the old children list of v is not 
delayed by splits in the list. ■ 

For our exponential search trees, we will use the postprocessing for rebuilding S'-structures. 
We will still keep a high level of generality to facilitate other applications, such as, for 
example, a worst-case version of the original fusion trees jTHj. 
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Corollary 10 Given a number series n 0} ni,n2, ■ ■ ., with uq = 1, n\ > 84, 72 i+1 > 18n i; we 
maintain a multiway tree where each node on height i which is neither the root nor a leaf 
node has weight between 7ij/4 and n^. If an S -structure for a node on height i can be built 
in 0(rii-\ti) time, t{ = we can maintain S-structures for the whole tree in 0(Y^ =l ti) 

time per finger update. 

Proof: In Section |2J we described how a finger update, in constant time, translates into the 
insertion or deletion of a leaf. We can then apply Theorem |HJ 

Our basic idea is that we have an ongoing periodic rebuilding of the S'-structure at each 
node v. A period starts by scanning the splitter list of the children of v in 0(ni/n»_i) time. 
It then creates a new S-structure in 0(nj_itj) time, and finally, in constant time, it replaces 
the old S-structure with the new S-structure. The whole rebuilding is divided into 7ij_i/160 
steps, each taking 0(U) time. 

Now, every time an update contributes to a join or split postprocessing on level i — 1, 
we perform one step in the rebuilding of the S-structure of the parent p, which is on level i. 
Then Theorem |S] ascertains that we perform nj_!/84 steps on S(p) during the postprocessing, 
and hence we have at least one complete rebuilding of S(p) with(out) the splitter created 
(removed) by the split (join). 

When two neighboring nodes u and v on level i — 1 join, the next rebuilding of S(u) 
will automatically include the old children of v. The rebuilding of S(v) is continued for all 
updates belonging under the old v until the S(u) is ready to take over, but these updates 
will also promote the rebuilding of S(u). This way we make sure that the children of v and u 
do not experience any delay in the rebuilding of their parents S-structure. Note that S(u) is 
completely rebuilt in nj_ 2 /84 steps which is much less than the nj_i steps we have available 
for the postprocessing. 

During the postprocessing of the join, we may, internally, have to forward keys between 
u and v. More precisely, if a key arrives at v from the parents S'-structure and S(u) has been 
updated to take over S(v), the key is sent through S(u). Conversely, S(v) is still in use; if a 
key arriving at u is larger than or equal to the splitter of v it is sent through S(v). 

The split is implemented using the same ideas: all updates for the two new neighboring 
nodes u and v promote both S(u) and S(v). For S(u), we finish the current rebuilding over 
all the children before doing a rebuild excluding the children going to v. By the end of the 
latter rebuild, S(v) will also have been completed. If a key arrives at v and S(v) is not ready, 
we send it through S(u). Conversely, if S(v) is ready and a key arriving at u is larger than 
or equal to the splitter of v, the key is sent though S(v). m 

Below we establish some simple technical lemmas verifying that Corollary El applies to the 
exponential search trees from Definition [7| The first lemma shows that the number sequences 
Hi match. 

Lemma 11 With a = max{84 (fc ~ 1)/fc , 18 (fc_1)2/fc } and n { = q ; (1+1/( ' c_1))1 as in Definition^ 
ri\ > 84 and n i+ i/ni > 18 for i > 1. 

Proof: m > (84( fc - 1 )/ fc ) 1+1 /( fe - 1 ) = 84 and n i+l /m = n]^ x > n 2 /k > «(^/(fe-i)) 2 A. a 



13 



Next, we show that the S'-structures are built fast enough. 

Lemma 12 With Definition^ creating an S-structure for a node u on level i takes 0(nj_i) 
time, and the total cost of a finger update is O (log log n). 

Proof: Since u has degree at most 4n| fc// , the creation takes 0((?V fc ) fe_1 ) = 0((n| _1 ^ fe ) = 
0(rii-i) time. Thus, we get ti = 0(1) in Corollary[Tni corresponding to a finger update time 
of O (log log n). ■ 

Since S(n) = any time bound derived from Theorem ^ is f2(loglogn), dominating our 

cost of a finger update. 

Next we give a formal proof that the recursion formula of Theorem ^ holds. 

Lemma 13 Assuming that the cost for searching in a node of degree d is 0(S(d)), the 
search time for an n key exponential search tree from Definition ^ is bounded by T(n) < 
y( n i-i/fc) + 0(S(n)) for n = lu(1). 

Proof: Since = n] 1 ^ k and since the degree of a level i node is at most 4n^ fe , the search 
time starting just below the root at level — 1 is bounded by T'(n^_i) where < n and 
T'(m) < T'(m 1 - 1 / k )+0(S(Am 1 / k )). Moreover, for m = w(l), 4m 1 /* > m, so 0(S(Am 1 l k )) = 
0(S(m)). 

The degree of the root is bounded by n, so the search time of the root is at most S(n). 
Hence our total search time is bounded by S(n) + T'(n h _i) = 0{T{n)). Finally, the O in 
0{T{n)) is superfluous because of the O in 0(S(n)). ■ 

Finally, recall that our original analysis, showing that exponential search trees used linear 
space, was not complete. Below comes the formal proof. 

Lemma 14 The exponential search trees from Definition^ use linear space. 

Proof: Consider a node v at height %. The number of keys below v is at least n»/4. Since 
v has degree at most knj k , the space of the .S-structure by v is 0((4n^ ) fc_1 ) = 0(n] 1 ^ k ). 
Distributing this space on the keys descending from v, we get 0(n i 1 ^ k ) space per key. 

Conversely, for a given key, the space attributed to the key by its ancestors is 

o(Eto"T 1/ *) = o(i). 

The above lemmas establish that Theorem U holds if we can prove Theorem |H| 
3.2 A game of weight balancing 

In order to prove Theorem |H1 we consider a game on lists of weights. In relation to Theorem 
IS1 each list represents the weights of the children of a node on some fixed level. The purpose 
of the game is crystallize what is needed for balancing on each level. Later, in a bottom-up 
induction, we will apply the game to all levels. 
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First we consider only one list. 

Our goal is to maintain balance, that is, for some parameter 6, called the "latency" , all 
weights should be of size 0(6). An adversary can update an arbitrary weight, incrementing 
or decrement it by one. Every time the adversary updates a weight, we get to work locally 
on balance. We may join neighboring weights W\ and w 2 into one w = W\ + w 2l or split a 
weight w into two wi and w 2 , u>i + w 2 = w. 

Each join or split takes 6 steps. A join, replacing Wiw 2 with w = W\ + w 2 , takes place 
instantly, but requires a b step postprocessing. A split, replacing w with Wiw 2 , u>i + w 2 = w, 
happens at a time chosen by the advisary during the b steps. The adversary must fulfill that 
\wi — w 2 \ < A6, where A is called the "split error". This should be satisfied from the split 
is done and until the split process are completed. 

Every time a weight involved in a join or split process is updated, we get to do one step 
on the process. However, we may also need to work on joining or splitting of neighboring 
weights. More precisely, a weight is said to be free if it is not involved in a split or join process. 
In order to ensure that free weights do not get too small, we need a way of requesting that 
they soon be involved in a join or split process. To this end, we introduce tieing: if a free 
weight v has an involved neighbor w, we may tie v to w, and then each update to v will 
progress that process on w by one step. 

Recall again that we are really working on a family of lists which the adversary may cut 
and concatenate. We note that the adversary may only do one operation at the time; either 
an update or a cut or a split, and for each operation, we get time to respond with an update 
step. Our only restriction on the adversary is that it is not allowed to cut between weights 
involved in a join or split process, or cut off a weight tied to such a process. 

Proposition 15 Let \l > 1. Let b be the "latency" , and A be the "split error". A list is 
"neutral" if all weights are strictly between (/i + 3)6 and (2/i + A + 9)6. We start with neutral 
lists, and neutral lists can be added and removed at any time. As long as all lists have a 
total weight > (p + 3)6, there is a protocol guaranteeing that the each weight is between \ib 
and (3/i + A + 14)6, and that the total weight of any uncuttable segment of a list is at most 
(5// + A +19)6. 

In particular, for A = 7 and \i — 21, with start weights strictly between 246 and 586, 
we guarantee that the weights stay between 216 and 846 and that the maximum uncuttable 
segment is of size at most 1316. 

In our application, we define B = 846 so the base segments of size between jB and B, and 
the uncuttable segments are of size below 2B. A list is then neutral if all weights are between 
< \B and ||S > |£>. We are then given 6 = -^B update steps to perform a join or 
split, and during a split, the two new weights should differ by at most j^B. 

As a first step in the proof of Proposition we present the concrete protocol itself. 

(a) If a free weight gets up to s6, s = 2/i + A + 9, we split it. (Recall that we even allow 
an adversary to postpone the event when the split is actually made.) 

(b) If a free weight v gets down to mb, m = fi + 3 and has a free neighbor w, we join v 
and w, untieing w from any other neighbor. 
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(c) If a free weight v gets down to mb and has no free neighbors, we tie it to any one of 
its neighbors. If v later gets back up above mb, it is untied again. 

(d) When we finish a join postprocessing, if the resulting weight is > sb, we immediately 
split it. If a neighbor was tied to the joined node, the tie is transfered to the nearest 
node resulting from the split. 

If the weight v resulting from the join is < sb and v is tied by a neighbor, we join with 
the neighbor that tied v first. 

(e) At the end of a split postprocessing, if any of the resulting nodes are tied by a neighbor, 
it joins with that neighbor. Note here that since resulting nodes are not tied to each 
other, there cannot be a conflict. 

Note that our protocol is independent of legal cuts and concatenations by the adversary, 
except in (c) which requires that a free weight getting down to (fx + 3)6 has at least one 
neighbor. This is, however, ensured by the condition from Proposition that each list has 
total weight strictly larger than (fx + 3)6. 

Lemma 16 

(%) Each free weight is between fib and sb = (2 fx + A + 9)b. 

(ii) The weight in a join process is between (m+fx — 1)6 = (2/i + 2)6 > mb and (s + m+l)b = 
(3/2 + A + 13)6 > sb. 

(Hi) In a split process, the total weight is at most (3 fx + A + 14)) 6 and the split weights are 
between ((s-l-A)/2)6 = + 4)6 > mb and ((s+m + 2 + A)/2)6 = (l//+A+8)6 < sb. 

Proof: First we prove some simple claims. 

Claim 16A If (i), (ii), and (Hi) are true when a join process starts, then (ii) will remain 
satisfied for that join process. 

PROOF: For the upper bound note that when the join is started, none of the involved 
weights can be above sb, for then we would have split it. Also, a join has to be initiated 
by a weight of size at most mb, so when we start the total weight is at most (s + m)b, and 
during the postprocessing, it can increase by at most 6. 

For the lower bound, both weights have to be at least fib. Also, the join is either initiated 
as in (b) by a weight of size mb, or by a tied weight coming out from a join or split, which 
by (ii) and (iii) is of size at least mb, so we start with a total of at least (fx + m)b, and we 
loose at most 6 in the postprocessing. □ 

Claim 16B If (i), (ii), and (iii) are true when a split process starts, then (iii) will remain 
satisfied for that split process. 
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PROOF: For the lower bound, we know that a split is only initiated for a weight of size at 
least sb. Also, during the process, we can loose at most 6, and since the maximal difference 
between the two weights is Ab, the smaller is of size at least (s — 1 — A) /2b. 

For the upper bound, the maximal weight we can start with is one coming out from a 
join, which by (ii) is at most (s + m + l)b. We can gain at most b during the split processing, 
so the larger weight is at most ((s + m + 2 + A)/2)b. □ 

We will now complete the proof of Lemma EH by showing that there cannot be a first 
violation of (i) given that (ii) and (iii) have not already been violated. The only way we can 
possibly get a weight above sb is one coming out from a join as in (ii), but then by (d) it is 
immediately split, so it doesn't become free. 

To show by contradiction that we cannot get a weight below fib, let w the first weight 
getting down below fib keys. When w was originally created by (ii) and (iii), it was of size 
> m6, so to get down to fib, there must have been a last time where it got down to mb. It 
then tied itself to an involved neighboring weight w'. If w' is involved in a split, we know 
that when w' is done, the half nearest w will immediately start joining with w as in (e). 
However, if w' is involved in a join, when done, the resulting weight may start joining with 
a weight w" on the other side. In that case, however, w is the first weight to tie to the new 
join. Hence, when the new join is done, either w starts joining with the result, or the result 
get split and then w will join with the nearest weight coming out from the split. In the worst 
case, w will have to wait for two joins and one split to complete before it gets joined, and 
hence it can loose at most 36 = (m — fi)b while waiting to get joined. ■ 

Proof of Proposition^^] By Lemma IT?)| all weights remain between [ib and (3/j+ A+13)6. 
Concerning the maximal size of an uncuttable segment, the maximal total weight involved 
in split or join is (m + s + 1)6, and by (b)we can have a weight of size at most mb tied from 
either side, adding up to a total of (3m + s + 1)6 = (5/i + A + 19)6. 

3.3 Applying the protocol 

We now want to apply our protocol in order to prove Theorem |H1 

Given a number series n 1 ,n 2 ,..., with ni > 84, n i+ i > 18r7j, we can schedule 
split and joins to maintain a multiway tree where each non-root node v on height 
i > has weight at between n^/A and n^. A root node on height h > has weight 
at most and at least 2 children. The schedule gives the following properties: 

(i) When a leaf v is inserted or deleted, for each node u on the path from v to 
the root the schedule use one local update step contributing to the postprocessing 
of at most one join or split involving either u or a neighbor of u. 

(ii) For each split or join at level i the schedule ensures that we have rij/84 local 
update steps available for postprocessing. 

(iii) If the time of a local update step on level i is bounded by ti = £1(1), each 
update is supported in 0(^2 i=1 ti) time. 
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For each level % < h, the nodes are partitioned in children lists of nodes on level i + 1. We 
maintain these lists using the scheduling of Proposition E3 with latency bi = nj/84 and split 
error A = 7. With /x = 21, this will give weights between 216 and 846, as required. We need 
to ensure that the children list of a node on level i can be cut so that the halves differ by at 
most AcV For i — 1, this is trivial, in that the children list is just a list of leaves that can be 
cut anywhere, that is, we are OK even with A = 1. For i > 1, inductively, we may assume 
that we have the required difference of A6j_! on level below, and then, using Proposition 

we can cut the list on level i with a difference of 1316j_i. However, bi > 186j_i, so 
1316j_i < 7bi, as required. 

Having dealt with each individual level, three unresolved problems remain: 

• How do we for splits in constant time find a good place to cut the children list? 

• How does the protocol apply as the height of the tree changes? 

• How do we actually find the nodes on the path from the leaf v to the root? 

Splitting in constant time For each node v on level i, our goal is to maintain a good 
cut child in the sense that when cutting at that child, the lists will not differ by more than 
Abi. We will always maintain the sum of the weights of the children preceding the cut child, 
and comparing that with the weight of v tells us if it is at good balance. If an update 
makes the preceding weight to large, we move to the next possible cut child to the right, and 
conversely, if it gets to small, we move the cut child to the left. A possible cut is always at 
most 4 children away, so the above shifts only take constant time. Similarly, if the cut child 
stops being cuttable, we move in the direction that gives us the best balance. 

When a new list is created by a join or a split, we need to find a new good cut child. To 
our advantage, we know that we have at least bi update steps before the cut child is needed. 
We can therefore start by making the cut child the rightmost child, and every time we receive 
an update step for the join, we move to the right, stopping when we are in balance. Since the 
children list is of length 0(ni/rii^i), we only need to move a constant number of children to 
the right in each update step in order to ensure balance before the postprocessing is ended. 

Changing the height A minimal tree has a root on height 1, possibly with children. 
If the root is on height h, we only apply the protocol when it has weight at least 216/j, 
splitting it when the protocol tells us to do so. Note that there is no cascading effect, for 
before the split, the root has weight at most 846/j, and this is the weight of the new root 
at height h + 1. However b^ < c\ +1 /18, so it will take many updates before the new root 
reaches the weight 216 i+ i. The S'-structure and pointers of the new root are created during 
the postprocessing of the split of the old root. Conversely, we only loose a root at height 
h + 1 when it has two children that get joined into one child. The cleaning up after the old 
root, i.e. the removal of its S'-structure and a constant number of pointers, is done in the 
postprocessing of the join of its children. We note that the new root starts with weight at 
least 216/i, so it has at least 216/ l /84c\_i > 18/4 > 4 children. Hence it will survive long 
enough to pay for its construction. 
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Finding the nodes on the path from the leaf v to the root The obvious way to find 
the nodes on the path from the leaf v to the root is to use parent pointers, which according 
to LemmaElcan be computed in constant time. Thus, we can prove Theorem |S] from Lemma 
El The only problem is that we used the schedule of Theorem |H] to prove Lemma El To break 
the circle, consider the first time the statement of Theorem |H] or of Lemma E] is violated. If 
the first mistake is a mistaken parent computation, then we know that the scheduling and 
weight balancing of Theorem |H] has not yet been violated, but then our proof of Lemma El 
based on Theorem |H] is valid, contradicting the mistaken parent computation. Conversely, if 
the first mistake is in Theorem |S1 we know that all parents computed so far were correct, 
hence that our proof of Theorem |S] is correct. Thus there cannot be a first mistake, so we 
conclude that both Theorem |H] and Lemma El are correct. 

4 Static search structures 

In this section, we will prove Theorem |21 

In polynomial time and space, we can construct a deterministic data structure 
over d keys supporting searches in 0(mm{y/\ogd, log log U, 1 + j^^?}) time where 
W is the word length, and U < 2 W is an upper bound on the largest key. If 
we restrict ourselves to standard ACP operations, we can support searches in 
0((logcf) 3 / 4+0<1 )) worst-case time per operation. 

To get the final bounds in Corollary 01 we actually need to improve the first bound in the 
min-expression to O ( A/log n / log log n) and the second bound to O (log log U/\og log log U). 
However, the improvement is by Beame and Fich ,7j. We present our bounds here because 
(i) they are simpler (ii) the improvement by Beame and Fich is based on our results. 

4.1 An improvement of fusion trees 

Using our terminology, the central part of the fusion tree is a static data structure with the 
following properties: 

Lemma 17 (Fredman and Willard) For any d, d = O (W 1 ^ 6 ) , A static data structure con- 
taining d keys can be constructed in O (d 4 ) time and space, such that it supports neighbor 
queries in 0(1) worst-case time. 

Fredman and Willard used this static data structure to implement a B-tree where only 
the upper levels in the tree contain B-tree nodes, all having the same degree (within a 
constant factor). At the lower levels, traditional (i.e. comparison-based) weight-balanced 
trees were used. The amortized cost of searches and updates is 0(logn/log<i + \ogd) for 
any d = O (PF 1//6 ) . The first term corresponds to the number of B-tree levels and the second 
term corresponds to the height of the weight-balanced trees. 
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Using an exponential search tree instead of the Fredman/Willard structure, we avoid the 
need for weight-balanced trees at the bottom at the same time as we improve the complexity 
for large word sizes. 

Lemma 18 A static data structure containing d keys can be constructed in O (d 4 ) time and 
space, such that it supports neighbor queries in O + lj worst-case time. 

Proof: We just construct a static B-tree where each node has the largest possible degree 
according to Lemma El That is, it has a degree of min (d, W 1//6 ) . This tree satisfies the 
conditions of the lemma. ■ 

Corollary 19 There is a data structure occupying linear space for which the worst-case cost 
of a search and update is O ( j^rw + log log 

Proof: Let T(n) be the worst-case cost. Combining Theorem Q and Lemma IT8l gives that 

r <"> = °(Sw +1+r( " 4/5) )' 



4.2 Tries and perfect hashing 

In a binary trie, a node at depth i corresponds to an z-bit prefix of one (or more) of the 
keys stored in the trie. Suppose we could access a node by its prefix in constant time by 
means of a hash table, i.e. without traversing the path down to the node. Then, we could 
find a key x, or x's nearest neighbor, in 0(logW) time by a binary search for the node 
corresponding to x's longest matching prefix. At each step of the binary search, we look in 
the hash table for the node corresponding to a prefix of x; if the node is there we try with a 
longer prefix, otherwise we try with a shorter one. 

The idea of a binary search for a matching prefix is the basic principle of the van Emde 
Boas tree |33J However, a van Emde Boas tree is not just a plain binary trie 

represented as above. One problem is the space requirements; a plain binary trie storing d 
keys may contain as much as Q(dW) nodes. In a van Emde Boas tree, the number of nodes 
is decreased to 0(d) by careful optimization. 

In our application Q(dW) nodes can be allowed. Therefore, to keep things simple, we 
use a plain binary trie. 

Lemma 20 A static data structure containing d keys and supporting neighbor queries 
in 0(logW) worst-case time can be constructed in O (d 4 ) time and space. The implementa- 
tion can be done without division. 
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Proof: We study two cases. 

Case 1: W > d 1 ' 3 . Lemma HU gives constant query cost. 

Case 2: W < d 1 / 3 . In 0(dW) = o(d 2 ) time and space we construct a binary trie of 
height W containing all d keys. Each key is stored at the bottom of a path of length W and 
the keys are linked together. In order to support neighbor queries, each unary node contains 
a neighbor pointer to the next (or previous) leaf according to the inorder traversal. 

To allow fast access to an arbitrary node, we store all nodes in a perfect hash table such 
that each node of depth i is represented by the i bits on the path down to the node. Since 
the paths are of different length, we use W hash tables, one for each path length. Each hash 
table contains at most d nodes. The algorithm by Fredman, Komlos, and Szemeredi 
constructs a hash table of d keys in 0(d 3 W) time. The algorithm uses division, this can be 
avoided by simulating each division in 0(W) time. With this extra cost, and since we use W 
tables, the total construction time is O (d 3 W 3 ) = 0(d A ) while the space is 0(dW) = o(d 2 ). 

With this data structure, we can search for a key x in 0(logW) time by a binary search 
for the node corresponding to x's longest matching prefix. This search either ends at the 
bottom of the trie or at a unary node, from which we find the closest neighboring leaf by 
following the node's neighbor pointer. 

During a search, evaluation of the hash function requires integer division. However, as 
pointed out by Knuth [23], division with some precomputed constant p may essentially be 
replaced by multiplication with 1/p. Having computed r = \2 w /p\ once in 0(W) time, we 
can compute x Div p as \_xr/2 \ where the last division is just a right shift W positions. 
Since \_x/p\ — 1 < [_xr/2 J < \_x/p\ we can compute the correct value of x Div p by an 
additional test. Once we can compute Div, we can also compute mod. ■ 

An alternative method for perfect hashing without division is the one recently developed 
by Raman ■ Not only does this algorithm avoid division, it is also asymptotically faster, 



Corollary 21 There is a data structure occupying linear space for which the worst-case cost 
of a search and the amortized cost of an update is O (log W log log n) . 

Proof: Let T(n) be the worst-case search cost. Combining Lemmas HI and l2*Ul gives T(n) = 
O (log WO +T(n 4 / 5 ) . ■ 

4.3 Finishing the proof of Theorem [2] 

If we combine Lemmas [03 and 1201 we can in polynomial time construct a dictionary over d 
keys supporting searches in time S(d), where 



0(d 2 W). 




(3) 



Furthermore, balancing the two parts of the min-expression gives 
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To get AC bound in Theorem El we combine some known results. From Andersson's packed 
B-trees pQ, it follows that if in polynomial time and space, we build a static AC dictionary 
with membership queries in time t, then in polynomial time and space, we can build a static 
search structure with operation time 0(minj{it + logn/i}). In addition, Brodnik et.al. JU] 
have shown that such a static dictionary, using only standard AC operations, can be built 
with membership queries in time t = 0((logn) 1 ^ 2+ °^). We get the desired static search 
time by setting i = 0((\ogn) l l iJr °^) . This completes the proof of Theorem [2J, hence 
of Corollary UJ 

4.4 Two additional notes on searching 

Firstly, we give the first deterministic polynomial-time (in n) algorithm for constructing a 
linear space static dictionary with 0(1) worst-case access cost (cf. perfect hashing). 

As mentioned earlier, a linear space data structure that supports member queries (neigh- 
bor queries are not supported) in constant time can be constructed at a worst-case cost 
O (n 2 W) without division [2H|- We show that the dependency of word size can be removed. 

Proposition 22 A linear space static data structure supporting member queries at a worst 
case cost of 0(1) can be constructed in O (n 2+e ) worst-case time. Both construction and 
searching can be done without division. 

Proof: W.l.o.g we assume that e < 1/6. 

Since Raman has shown that a perfect hash function can be constructed in O (n 2 W) time 
without division) we are done for n > W 1 ^. 

If, on the other hand, n < W 1 ^, we construct a static tree of fusion tree nodes with 
degree O (n 1 / 3 ). This degree is possible since e < 1/6. The height of this tree is 0(1), the 
cost of constructing a node is O (n 4 / 3 ) and the total number of nodes is O (n 2//3 ) . Thus, the 
total construction cost for the tree is O (n 2 ). 

It remains to show that the space taken by the fusion tree nodes is 0(n). According 
to Fredman and Willard, a fusion tree node of degree d requires id 2 ) space. This space 
is occupied by a lookup table where each entry contains a rank between and d. A space 
of G (d 2 ) is small enough for the original fusion tree as well as for our exponential search 
tree. However, in order to prove this proposition, we need to reduce the space taken by a 
fusion tree node from (d 2 ) to G (d). Fortunately, this reduction is straightforward. We 
note that a number between and d can be stored in logo? bits. Thus, since d < W 1/6 , the 
total number of bits occupied by the lookup table is O (d 2 log<i) = 0(W). This packing of 
numbers is done cost efficiently by standard techniques. 

We conclude that instead of 6 (d 2 ), the space taken by the lookup table in a fusion tree 
node is 0(1) (0(d) would have been good enough). Therefore, the space occupied by a fusion 
tree node can be made linear in its degree. ■ 

Secondly, we show how to adapt our data structure to certain input distribution. 
In some applications, we may assume that the input distribution is favorable. These 
kind of assumptions may lead to a number of heuristic algorithms and data structures whose 
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analysis are based on probabilistic methods. Typically, the input keys may be assumed to be 
generated as independent stochastic variables from some (known or unknown) distribution; 
the goal is to find an algorithm with a good expected behavior. For these purposes, a 
deterministic algorithm is not needed. 

However, instead of modeling input as the result of a stochastic process, we may char- 
acterize its properties in terms of a measure. Attention is then moved from the process of 
generating data to the properties of the data itself. In this context, it makes sense to use a 
deterministic algorithm; given the value of a certain measure the algorithm has a guaranteed 
cost. 

We give one example of how to adapt our data structure according to a natural measure. 
An indication of how "hard" it is to search for a key is how large part of it must be read in 
order to distinguish it from the other keys. We say that this part is the key's distinguishing 
prefix. (In Section 14.21 we used the term longest matching prefix for essentially the same 
entity.) For W-bit keys, the longest possible distinguishing prefix is of length W. Typically, 
if the input is nicely distributed, the average length of the distinguishing prefixes is O(logn). 

As stated in Proposition we can search faster when a key has a short distinguishing 
prefix. 

Proposition 23 There exist a linear-space data structure for which the worst-case cost of 
a search and the amortized cost of an update is O (log b log log n) where b < W is the length 
of the query key's distinguishing prefix, i.e. the prefix that needs to be inspected in order to 
distinguish it from each of the other stored keys. 

Proof: We use exactly the same data structure as in Corollary!^ with the same restructur- 
ing cost of 0(log log n) per update. The only difference is that we change the search algorithm 
from the proof of Lemma EDI Applying an idea of Chen and Reif i we replace the bi- 
nary search for the longest matching (distinguishing) prefix by an exponential-and-binary 
search. Then, at each node in the exponential search tree, the search cost will decrease from 
0(logW) to O(logfe) for a key with a distinguishing prefix of length b. m 

5 Finger search and finger updates 

Recall that we have a finger pointing at a key x while searching for another key y, and let 
q be the number of keys between x and y. W.l.o.g. we assume y > x. In its traditional 
formulation, the idea of finger search is that we should be able to find y quickly if q is 
small. Here, we also consider another possibility: the search should be fast if y — x is small. 
Compared with the data structure for plain searching, we need some modifications to support 
finger search and updates efficiently The overall goal of this section is to prove the statement 
of Theorem EJ 

There is a fully- dynamic deterministic linear space search structure that supports 
finger updates in constant time, and given a finger to a stored key x, searches a 
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key y > x in time 
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where q is the number of stored keys between x and y. If we restrict ourselves to 
ACP operations, we still get a bound of O ((log g) 3 / 4 +°( 1 )) . 

Below, we will first show how to reduce the cost of finger updates from the O (log log n) in 
the last section to a constant. This will then be combined with efficient static finger search 
structures. 



5.1 Constant time finger update 

In this section, we will generally show how to reduce the finger update time of O (log log n) 
from Lemma IT21 to a constant. The O(loglogn) bound stems from the fact when we insert 
or delete a leaf, we use a local update step for each level above the leaf. Now, however, 
we only want to use a constant number of local update steps in connection with each leaf 
update. The price is that we have less local update steps available for the postprocessing of 
join and splits. More precisely, we will prove the following analogue to the general balancing 
in Theorem |SJ 

Theorem 24 Given a number series nx,n 2 , . . ., with n\ > 84, 18rij < n i+ i < nf for i > 1, 
we can schedule split and joins to maintain a multiway tree where each non-root node v on 
height i > has weight at between rij/4 and Hi. A root node on height h > has weight at 
most nh and at least 2 children. The schedule gives the following properties: 

(i) When a leaf v is inserted or deleted, the schedule uses a constant number of local 
update steps. The additional time used by the schedule is constant. 

(ii) For each split or join at level i > 1 the schedule ensures that we have at least ^/nl 
local update steps available for postprocessing, including one in connection with the split or 
join itself. For level 1, we have n\ local updates for the postprocessing. 



As we shall see later, the ^Jnl local update steps suffice for the maintenance of S'-structures. 
As for the Theorem |H1 we have 



Lemma 25 In TheoremWA the parent of any node can be computed in constant time. 



Proof: We use exactly the same construction as for Lemma 01 The critical point is that we 
for the postprocessing have a number of updates which is the proportional to the number of 
children of a node. This is trivially the case for level 1, and for higher levels i, the number 
of children is at most nj/(rij_i/4) = 0(*Jni). ■ 
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As in the proof of Theorem |HJ we will actually use the parents of Lemma EH1 in the proof of 
Theorem As argued at the end of Section 13.31 this does not lead to a circularity. 

Abstractly, we will use the same schedule for join and splits as in the proof of Theorem |S| 
However, we will not perform as many local update steps during a join or split process. 
Moreover, the structural implementation of a join or split will await its first local update. 

We note that level 1 is exceptional, in that we need n\ local updates for the split and join 
postprocessing. This is trivially obtained if we with each leaf update make 84 local updates 
on any postprocessing involving or tied to the parent. For any other level i > 1, we need 
y/nl local updates, which is what is obtained below. 

The result will be achieved by a combination of techniques. We will use a tabulation tech- 
nique for the lower levels of the exponential search tree, and a scheduling idea of Levcopoulos 
and Overmars j2Hl for the upper levels. 

5.1.1 Constant update cost for small trees on the lower levels 

In this subsection, we will consider small trees induced by lower levels of the multiway tree 
from Theorem |2U 

One possibility for obtaining constant update cost for search structures containing a few 
keys would have been to use atomic heaps [T^j. However, here we aim at a solution using 
only AC operations. We will use tabulation. A tabulation technique for finger updates was 
also used by Dietz and Raman [T3j. They achieved constant finger update time and O(logg) 
finger search time, for q intermediate keys, in the comparison based model of computation. 
However, their approach has a lower bound of fi(logg/loglogg) as it involves ranks [20J, 
and would prevent us from obtaining our 0(^\og qj log logg) bound. Finally, we note that 
our target is the general schedule for multiway trees in Theorem which is not restricted 
to search applications. 

Below, we present a schedule satisfying the conditions of Theorem |2*H except that we need 
tables for an efficient implementation. 

Every time we insert or delete a leaf u, we will do 1000 local update steps from u. The 
place for these local update steps is determined based on a system of marking and unmarking 
nodes. To place a local update from a leaf u, we find its nearest unmarked ancestor v. We 
then unmark all nodes on the path from u to v and mark v. If v is involved in or tied to 
a join or split process, we perform one local update step on this process. If not, we check 
if the weight of v is such that it should split or join or tie to a neighboring split or join, as 
described in the schedule for Proposition We then mark the involved nodes and perform 
a local update step at each of them. 

Lemma 26 For a split or join process on level i, we get at least local updates steps. 
Note that Hi > 18 l , so n«/2* > ^/nl, as in Theorem |2~H 

Proof: First, using a potential function argument, we analyze how many time a level i node 
v gets marked during p local updates from leaves below v. The potential of a marked node 
is while the potential of an unmarked node on level i is 2\ The sub-potential of v is the 
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sum of the potential of all nodes descending from or equal to v. Then, if an update below 
v does not unmark v, it decreases the sub-potential by 1. On the other hand, if we unmark 
v, we also unmark all nodes on a path from a leaf to v, so we increase the potential by 
2 l+1 — 1. When nodes are joined and split, the involved nodes are all marked so the potential 
is not increased. The potential is always non-negative. Further, its maximum is achieved if 
all nodes are unmarked. Since all nodes have degree at least 4, if all nodes are unmarked, 
the leaves carries for more than half of the potential. On the other hand, the number of 
leaves below v is at most rij, so the maximum potential is less than 2?v It follows that the 
number of times v gets unmarked is more than (p — 2n$)/2* +1 . Hence v gets marked at least 
(p — 2n i )/2' l+l times. 

From Theorem[H]we know that during a split or join process, there are at least rij/84 leaf 
updates below the at most 4 level i nodes Vo, v±, i>2, U3 involved in or tied to the process. Each 
of these leaf updates results in 1000 local updates. Thus, if pj is the number of leaf updates 
from below Vj during our process, po + p\ + P2 + P3 > lOn*. Consequently, the number of 
local updates for our process is 

3 

J2(Pj - 2^)/2 i+1 > (10rn - 8^)/2 i+1 = m/2\ 

as desired. ■ 

The above schedule, with the marking and unmarking of nodes to determine the local update 
steps, could easily be implemented in time proportional to the height of the tree, which is 
O (log log n). To get down to constant time, we will use tables of size o(n) to deal with small 
trees with up to m nodes where m = 0(y/logn). Here we think of n as a fixed capacity for 
the total number of stored keys. As the number of actual keys change by a factor of 2, we 
can build a data structure with new capacity in the background. 

Consider an exponential search tree E with at most m nodes. With every node, we are 
going to associate a unique index below m, which is given to the node when it is first created 
by a split. Indices are recycled when nodes disappear in connection with joins. We will have 
a table of size m that maps indices into the nodes in E. Conversely, with each node in E, 
we will store its index. In connection with an update, tables will help us find the index to 
the node to be marked, and then the table give us the corresponding node. 

Together with the tree E, we store a bitstring te representing the topology of E. More 
precisely, te represents the depth first search traversal of E where 1 means go down and 
means go up. Hence, te has length 2m — 2. Moreover, we have a table fiE that maps 
depth first search numbers of nodes into indices. Also, we have a table je that for every 
node tells if it is marked. We call a# = (te, He,1e) the signature of E. Note that we have 
< 2 2m x m m x 2 m x 0(m) = different signatures. 

For each of the signatures, we tabulate what to do in connection with each possible leaf 
update. More precisely, for a leaf delete, we have a table that takes a signature of tree and 
the index of the leaf to be deleted and produces the signature of the tree without the leaf. 
Thus when deleting a leaf, we first find its associated index so that we can use the table to 
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look up the new signature. Similarly, for a leaf insert, we assume the index of a preceding 
sibling, or the parent if the leaf is to be a first child. The table should produce not only the 
new signature, but also the index of the new leaf. This index should be stored with the new 
leaf. Also, the leaf should be stored with the index in the table mapping indices to nodes. 

For the local updates, we have a table taking a signature and the index of a leaf to do the 
local update from. The table produces the index of the node to be marked, hence at which 
to do a local update. If a split or join is to be done, the table tells the indices of involved 
nodes. For a split, this includes the child at which to split the children sequence. Also, it 
includes the index of the new node. Finally, the table produces the signature of the resulting 
tree. All the above mentioned tables can easily be constructed in m 0(m ) = o(n) time and 
space. 

Let a be such that n a < y/\ogn < n a+ i and set m = n a . We are going to use the 
above tabulation to deal with levels 0, a of the multiway tree of Theorem^] Note that if 
rii > \/\og n, a = 0, and then we can skip to the next subsection (E 15.1.2)) . With each of the 
level a nodes, we store the signature of the descending subtree as well as the table mapping 
indices to nodes. Also, with each leaf, we store an ancestor pointer to its level a ancestor. 
Then, when a leaf is added, it copies the ancestor pointer of one of its siblings. Via these 
ancestor pointers, we get the signature of the tree that is being updated. 

A new issue that arises is when level a nodes u and v get joined into u. For this case, we 
temporarily allow indices up to 2m — 1, and add m to the indices of nodes descending from 
v. A table takes the signatures of the subtrees of u and v and produce the signature for the 
joined tree with these new indices. Also, we place a forward pointer from v to u, so that 
nodes below v can find their new ancestor in constant time. To get the index of a node, we 
take its current ancestor pointer. If it points to a node with a forward pointer, we add m 
to the stored index. Conversely, given an index, if it is not less than m, this tells us that we 
should use the old table from v, though subtracting m from the index. 

During the postprocessing of the join, we will traverse the subtree that descended from v. 
We move each node w to u, redirecting the ancestor pointers to u and give w a new unique 
index below m. Such an index exists because the total size of the tree after the join is at 
most m. The indexing is done using a table that suggests the index and the resulting new 
signature. The node is then inserted in the table at u mapping indices below m to nodes. 
Since we use the same general schedule as that in Theorem we know that we have n a /84 
updates below the join before the join needs to be completed. In that time, we can make 
a post traversal of all the at most n a descendants of the join, assigning new indices and 
updating parent pointers. We only deal with a constant number of descendants at the time. 
For the traversal, we can use a depth first traversal, implemented locally as follows. At each 
point in time, we are at some node w, going up or down. We start going down at that first 
child of v from when the join was made. If we are going down, we move w to its first child. 
If we are going up and there is a sibling to the left, we go to that sibling, going down from 
there. If we are going up and there is no sibling to the left, we go up to the parent. At each 
node, we check if it has already been moved to u by checking if the ancestor pointer points 
to u. If we are about to join or split the traversal node w, we first move w away a constant 
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number of steps in the above traversal. This takes constant time, and does not affect the 
time bounds for join and split. 

A level a split of u into u and v is essentially symmetric but simpler in that we do not 
need to change the indices. In the traversal of the new subtree under v, we only need to 
redirect the ancestor pointers to v and to build the table mapping indices to nodes in the 
new subtree. 

The traversals take constant time for level a join and split processes for each descending 
leaf updates. In the next subsection, we are going to do corresponding traversals for two 
other distinguished levels. 

Including the new tables for index pairs, all tables are constructed in m°' m ' = o(n) time 
and space. With them, we implement the schedule of Theorem 1241 for levels i = 0, .., a using 
constant time and a constant number of local update steps per leaf update, yet providing at 
least ^ffii local updates for the postprocessing of each join or split. 

5.1.2 Moving up the levels 

We are now going to implement the schedule of Theorem |^ on levels a + 1 and above. In 
connection with a leaf update, we have constant time access to its level a ancestor, hence 
also to its level a+1 ancestor. We note that if n\ > \/log n, a = 0, and then we are not using 
any of the material from the previous subsection (E j5.1.1jl . Then the whole construction will 
be implementable on a pointer machine. 

To get to levels a + 1 and above, we are going to use the following variant of a lemma of 
Overmars and Levcopoulos [28 : 

Lemma 27 Given p counters, all starting at zero, and an adversary incrementing these 
counters arbitrarily. Every time the adversary has made q increments, the increments being 
by one at the time, we subtract q from some largest counter, or set it to zero if it is below q. 
Then the largest possible counter value is O(glogp). 

In the original lemma from [21], instead of subtracting q from a largest counter, they split 
it into two counters of equal size. That does not imply our case, so we need our own proof, 
which also happens to be much shorter. 

Proof: We want to show that the maximal number of counters larger than 2iq is at most 
p/2 l . The proof is by induction. Obviously, the statement is true for i = 0, so consider i > 0. 
Consider a time t where the number of counters larger than p/2 l is maximized, and let t~ 
be the last time before t at which the largest counter was (2i — l)q. 

We consider it one step to add 1 to q counters, and subtract q from a largest counter. 
Obviously, at the end of the day, we can at most do q better in total. 

The basic observation is that between t~ and t, no change can increase the sum of the 
counter excesses above (2i — 2)q, for whenever we subtract q it is from a counter which is 
above (2i — l)q. However, at time t~ , by induction, we had only p/2 1 ^ 1 counters above 
(2i — 2)q, and each had an excess of at most q. To get to 2iq, a counter needs twice this 
excess, and since the total excess can only go down, this can happen for at most half the 
counters. ■ 
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For the implementation of Lemma we have 

Lemma 28 Spending constant time per counter increment in Lemma\^F\ the largest counter 
to be reduced can be found in constant time. 

Proof: We simply maintain a doubly linked sorted list of counter values, and with each value 
we have a bucket with the counters with that value. When a counter c is increased from x 
to x + 1, we check the value x' after x in the value list. If x' > x + 1, we insert x + 1 into 
the value list with an associated bucket. We know move c to the bucket of x + 1, removing 
x if its bucket gets empty. Decrements by one can be handled symmetrically. Thus, when 
a largest counter a has been picked, during the next k increments, we can decrement a by 
one. ■ 

We are going to use the above lemmas in two bands, one on levels a + 1, ...,b where b is 
such that n b < (logn) loglogri < n^+i, and one levels 6+1 and up. First, we consider levels 
a + 1, b. 

To describe the basic idea, for simplicity, we temporarily assume that there are no joins 
or splits. Set q = b — a. For i — a + 1, b, during f2(rij) leaf updates below a node v on level 
i, we will get fl(ni/q) local updates at v. Since n i+ i > 18ni, q < \og 18 (nb/n a ) < (log log n) 2 . 
On the other hand, > n a+1 > y/\ogn, so q = o(y/n~i). 

Each level a + 1 node v has a counter that is incremented every time we have a leaf 
update below v. In the degenerate case where a = 0, we always make a local update at v so 
as to get enough updates on level 1 as required by Theorem 1241 We make an independent 
schedule for the subtree descending from each level b node u. Once for every q updates below 
u, we pick a descending level i node with the largest counter, do a local update at v , and 
subtract q from the counter. During the next q — 1 leaf updates below u, we follow the path 
up from v to u, doing a local update at each node on the way. 

A largest counter below u is maintained as described in Lemma EHl The number of 
counters below u is at most p = n^j \n a / '4), so by Lemma l2*7| the maximal counter value is 
0(q\ogp) = 0((logn 6 ) 2 ) = 0((loglogn) 4 ). 

Now, for i = a + 1, ...,b, consider a level i node w. The maximal number of counters 
below w is nj/(4n a+1 ), so their total value is at most 

0((ni/n a+1 )(loglogn) 4 ) = 0((n i / v / logn)(log logn) 4 ) = o(rii). 

Each update below w adds one to this number. Moreover, we do a local update at w every 
time we subtract q from one of the descending counters, possibly excluding the very last 
subtraction if we have not passed w on the path up to u. Consequently, during r = fl(ni) 
leaf updates below w, the number of local updates at w is at least 

(r - o(ui) - q)/q = Q(ui/q) = u(^/nl). 

Next, we show how to maintain approximate weights. For the nodes v on level a + 1, we 
assume we know the exact weight W v . For nodes w on levels i = a + 1, we have 
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an approximate weight W w . When the counter of a level a + 1 node v is picked, we set 
A = W v — W v and Wy = As we move up from v to u during the next q — 1 updates, at 
each node w, we set W v = W v + A. 

We will now argue that for any node w on level i — a + 1, b, the absolute error in our 
approximate weight W w is o(nj). The error in W w is at most the sum of the counters below 
w plus q, and we have already seen above that this value is o(rij). It follows that 

W w = (l±o(l))W w . 

This error is small enough that we can use the approximate weights for the scheduling of split 
and joins. More precisely, in the analysis, we rounded at various points, and the rounding 
left room for errors below a constant fraction. 

We are now ready to describe the details of the schedule as nodes get joined and split. 
From the last subsection, we have ancestor pointers to level a, and via table we can also get 
the exact weight. From this, we can easily get ancestor pointers and exact weights on level 
a + 1. On level a + 1, we can then run the join and split schedule from Section 1331 

For level i — a + 2, b, we use the approximate weights bothjor the nodes and for the 
children. When we get a local update at a node w, we know that W w has just been updated 
and that it equals the sum of the weights of the children, so we do have local consistency 
in the approximate weights. We then use the new approximate weight in the schedule of 
Proposition El to check if w is to be joined or split or tied to some other join or split process. 
The local update step is applied to any join or split process neighboring w. 

Finally, we use the traversal technique from the last subsection to maintain ancestor 
pointers to level b nodes. This means that we use constant time on level b in connection 
with each leaf update. In connection with a join or split postprocessing on level b, this time 
also suffice to join or split the priority queue over counters below the processed nodes. This 
completes our maintenance of levels a + 1 b. 

For the levels above b, we use the same technique as we did for levels a + 1, ....,6, but 
with the simplification that we have only one tree induced by levels above b. Consequently, 
we have only one priority queue over all counters on level b. The numbers, however, are a 
bit different. This time, the number q' of levels is log l8 (n/nb) < logn. However, for i > b, 
Hi > (logn) loglogn , so q' = o(^nT). 

We have one priority queue over all counters on level b, of which there are at most p' = 
n/(ri6+i/4), so by Lemma 071 the maximal counter value is 0(q'\ogp') = 0(logn(loglogn) 2 ). 

Now, for i > b, consider a level i node w. The maximal number of counters below w is 
7Zj/(4nb + i), so their total value is at most 

0((ni/rib+i) logn(loglogn) 2 ) = 0((rii/ logn loglogn ) logn(loglogn) 2 ) = o(nj). 

With the above changes in numbers, we use the same technique for levels above b as we used 
for levels a + 1, b. This completes the proof of Theorem 1241 

Corollary 29 Given a number series no,n\,n2 <, ■ ■ ■, with no = 1, n\ > 84, nf > rij+i > 
18rii, we maintain a multiway tree where each node on height i which is neither the root nor 
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a leaf node has weight between rij/4 and n^. If an S -structure for a node on height i can 
be built in 0{^Jni_i) time, or 0(n\) time for level 1, we can maintain S-structures for the 
whole tree in constant time per finger update. 

Proof: We use the same proof as the one we used to prove Corollary El from Theorem 1241 



5.2 Fast finger search 

We first concentrate on implementing a fast finger search, postponing the constant time finger 
updates to the next subsections. For simplicity, we will always assume that the fingered key 
x is smaller than the key y sought for. The other case can be handled by a "mirror" data 
structure where each key x is replaced by U — x, where U is the largest possible key. 

The following finger search analog of Theorem is obtained using the same kind of 
methods as for pointer based finger search structures, i.e. by the use of horizontal links. 

Theorem 30 Suppose a static search structure on d integer keys can be constructed in 
0{d^ 1 ^ 2 ), k > 2, time and space so given a finger to a stored key x, we can search 
a key y > x in time S(d,y — x). We can then construct a dynamic linear space search 
structure that with n integer keys supports finger updates in time constant time and fin- 
ger searches in time T(q,y — x) where q is the number of stored keys between x and y and 
T{n) < T(n 1-1 / fc ) + 0(S(n,y — x)). Here S is supposed to be non- decreasing in both argu- 
ments. The reduction itself uses only standard ACP operations. 

Proof: We use an exponential search tree where on each level we have horizontal links 
between neighboring nodes. It is trivial to modify join and split to leave horizontal pointers 
between neighboring nodes on the same level. 

A level i node has 0(rij/nj_i) = 0(n l J k ) children, so, by assumption, its S-structure is 
built in time 0(nf~ 1)/(2&) ) = rii-i). Hence we can apply Corollary EUl and maintain 
S'-structures at all nodes in constant time per finger update. 

To search for y > x, given a finger to x, we first traverse the path up the tree from the 
leaf containing x. At each level, we examine the current node and its right neighbor until a 
node v is found that contains y. Here the right neighbor is found in constant time using the 
horizontal links between neighbors. As we shall see later, the node v has the advantage that 
its largest possible degree is closely related to q. 

Let u be the child of v containing x and let x' be the separator immediately to the right 
of u. Then, x < x' < y, and if we start our search from x', we will find the child w where y 
belongs in S(d, y — x') < S(d, y — x) time, where d is the degree of v. 

We now search down from w for y. At each visited node, the left splitter x' satisfies 
x < x' < y so we start our search from the left splitter. 

We are now going to argue that the search time is T(q,y — x) < T '(g 1-1 ^) + O (S '(q, y — x)) , 
as stated in the lemma. Let % be the level of the node v. Let u be the level i — \ ancestor of the 
leaf containing x, and let u' be the right neighbor of u. By definition of v, y does not belong 
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to u', and hence all keys below u' are between x and y. It follows that q > n[u') > nj_i/10. 
Now, the recursive search bound follows using the argument from the proof of Lemma ED ■ 

Note in the above lemma, that it does not matter whether the static search structure 
supports efficient finger search in terms of the number d of intermediate keys. For example, 
the static search bound of 0(^/\ogn/ log logn) from [7] immediately implies a dynamic 
finger search bound of O ( ■sj log q / log log q) where q is the number of stored keys between the 
fingered key x and the sought key y. However, if we want efficiency in terms of y — x, we 
need the following result. 

Lemma 31 A data structure storing a set X of d keys from a universe of size U can be 
constructed in d°^ time and space such that given a finger to stored key x 6 X, we search 
a key y > x in time 0(log log(y — x)j log log log(y — x)) . 

Proof: Beame and Fich [7] have shown that a polynomial space search structure can be 
constructed with search time 0(min{ A/log nj log logn, log log U / log log U}), where n is the 
number of keys and U = 2 W is the size of the universe they are drawn from. As a start, we will 
have one such structure over our d keys. This gives us a search time of O ( v/log d / log log d) . 
Hence we are done if log log (y — x)/ log log log (y — x) = Vl ( yjlog d / log log d) , and this is the 
case if y — x > 2 d . 

Now, for each key x G X, and for % = 0, log log cf, we will have a search structure 

S x ,i over the keys in the range [x, x + 2 ), with search time 0(log log 2 / log log log 2 ) = 

2 ri°g i°g i°g(y - x )i 

0(2 l /i). Then to find y < x + 2 , we look in S^riogiogiog^-z)] • Now, 2 < (y — 

x) log ( y ~ x \ the search time is 0(\og\og(y — xy° g ^ y ~ x ^/\og\og\og(y — x) log( - y ~ x ^) = 0(loglog(y — 
x) /log log \og{y -x)). 

It should be noted that it is not a problem to find the appropriate S x ^. Even if for each 

x, we store the S Xi i as a linked list together with the upper limit value of x + 2 , we can 

get to the appropriate S Xji by starting at S Xj0 and moving to larger S X:i until y < x + 2 . 
This takes 0(logloglog(y — x)) = o(loglog(y — x)/ logloglog(y — x)) steps. 

Finally, concerning space and construction time, since we only have 0(loglog<i) search 
structures for each of the d elements in X, polynomiality follows from polynomiality of the 
search structure of Beame and Fich. ■ 

Proof of Theorem [5j The result follows directly from the reduction of Theorem |3*U1 
together with the static search structures in Theorem El Theorem El and Lemma ETT1 ■ 

6 String searching 

In this section, we prove Theorem |3 
6.1 Preliminaries 

Our string searching result utilizes Corollary 0] and Proposition 1221 
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Tries As a basic component, we use a trie over the strings where the characters are 1-word 
integers §111]. For technical reasons, we assume that each string ends with a special 
character _L, hence that no string is a prefix of any other string. Abstractly, a trie over a 
set S of strings is the rooted tree whose nodes are the prefixes of strings in S. For a string 
a and a 1-word character a, the node aa has parent a and is labeled a. The root is not 
labeled, so aa is the labels encountered on the path from the root to the node aa. Our trie 
is ordered in the sense that the children of a node are ordered according to their labels. We 
use standard path (or Patricia) compression, so paths of unary nodes are stored implicitly 
by pointers to the stored strings. Hence the trie data structure is really concentrated on the 
0(n) branching nodes. 

By storing appropriate pointers, the problem of searching a string x among the stored 
strings S reduces to (1) finding the longest common prefix a between x and the strings 
in S, and (2) searching the next 1-word character of x among the labels of the children 
of the trie node a. In a static implementation, we would use a dictionary in each node, 
which would allow us to spend constant time at each visited node during step (1). Then, by 
keeping a search structure from Corollary 0] at each branching node, we perform step (2) in 
0(\J\ogn/ log logn) time, which is fine. 

However, in a dynamic setting we cannot use dictionaries in each node over all children 
since we cannot update linear spaced dictionaries efficiently in the worst case. Instead, we 
will sometimes allow step (1) to spend more than constant time in each visited node. This 
is fine as long as the total time spent in step (1) does not exceed the total bound aimed at. 

6.2 Efficient traversal down a trie 

Our new idea is to only apply the constant time dictionaries to some of the children. In 
a trie node, we differ between "heavy" and "light" children, depending on their number of 
descending leaves. The point is that heavy children will remain for a long time, and hence 
we can store them in a dictionary which is rebuilt during a relatively slow rebuilding process. 
For light children, we cannot use a dictionary, instead we store them in a dynamic search 
structure from Corollary HJ Although this will give rise to a non-constant search time, we 
are still fine since the low weight of the found child will guarantee that the problem size has 
decreased enough to compensate for the search effort. 

In more detail: At a node with weight m, we only store heavy children with f2(m 1_1 / fc ) 
descending keys in a dictionary, where k = 2 + e is the exponent from the dictionaries in 
Proposition 1221 the other children are stored in a dynamic search structure (an exponential 
search tree). Our string searching time is then 0(£) for the use of dictionaries and for 
following pointers. The total cost of using the search structures bounded by T(n), where 

T(m) < O^logm/loglogm) + T(m 1 ~ 1/fc ) 
= O{s/\ogmj log logm). 

Adding up, our total time bound is O ( A/log n / log log n + i), which is optimal. 
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We maintain the dictionary of a node v by periodic rebuilding. We maintain an unordered 
list of all children with more than m 1-1 / fc /2 descendants. In every period, we first scan the 
list, and then build a dictionary over the labels of the scanned children. When the new 
dictionary is completed it replaces the previous one in constant time. There are only 0(m 1//fc ) 
labels, so this takes 0(m 1 ' k '^ k ~ 1 ') = 0(m 1_1 / fc ) time. Hence, spending 0(1) time per update 
to a descendant, we can complete a period for every m 1_1 / fc /4 updates, and this ascertains 
that no child can contain more than m 1_1//fc children without being in the current dictionary. 

The space bound is proven in a rather straightforward manner. 

Practical simplifications In our reduction, we used a polynomial spaced dictionary from 
Proposition 1221 By increasing the exponent, we can allow ourself to use even simpler and 
faster hashing schemes, such as 1-level hashing with quadratic space, which would remove 
collision handling. This way of using space seems to be a good idea also for more practical 
randomized data structures. 

7 Other applications of our techniques 

In this section we discuss how the techniques presented in this paper have been applied in 
other contexts. 

Variants of exponential search trees have been instrumental in many of the previous 
strongest results on deterministic linear integer space sorting and priority queues [21 EDI 01221 • 
Here a priority queue is a dynamic set for which we maintain the minimum element. When 
first introduced by Andersson [2], they provided the then strongest time bounds of 0(y/\ogn) 
for priority queues and O(ny / \ogn) for sorting. As noted by Thorup in 30J , we can surpass 
the Q( y/log dj log log d) lower bound for static polynomial space searching in a set of size d if 
instead of processing one search at the time, we process a batch of d searches. Thorup got the 
time per key in the batch down to 0(loglog<i). In order to exploit this, Thorup developed 
an exponential priority queue tree where the update time was bounded by (JTJ), but with S(n) 
being the per key cost of batched searching. Thus he got priority queues with an update time 
of 0((loglogn) 2 ) and hence sorting in 0(n(loglogn) 2 ) time. Thorup's original construction 
was amortized, but a worst-case construction was later presented by Andersson and Thorup 
0. More advanced static structures for batched searching where later developed by Han 
[22] who also increased the batch size to d 2 . He then ended up with a priority queue update 
time O ((log log n) (log log log n)) and sorting in 0(n(loglogri)(logloglogn)) time. However, 
exponential search trees are not used in Han's recent deterministic 0(n log log n) time sorting 
in linear space [22] or in Thorup's (3T] corresponding priority queue with O(loglogn) update 
time. Since (JTJ) cannot give bounds below O (log log n) per key, so it looks as if the role of 
exponential search trees is played out in the context of integer sorting and priority queues. 

Recently, Bender, Cole, and Raman [S] have used the techniques for to derive worst-case 
efficient cache-oblivious algorithms for several data structure problem. This nicely highlights 
that the exponential search trees themselves are not restricted to integer domains. It just 
happens that our applications in this paper are for integers. 
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Theorem |H] provides a general tool for maintaining balance in multiway trees. These 
kind of techniques have been used before, but they have never been described in an such a 
general independent quotable way. By using our theorems, many proofs of dynamization can 
be simplified, and in particular, we can avoid the standard hand-waving, claiming without 
proper proof that amortized constructions can be deamortized. The second author [3T] 
has recently used our Proposition i n a general reduction from priority queue to sorting, 
providing a priority queue whose update cost is the per key cost of sorting. Also, he [32] has 
recently used Theorem |H] in a space efficient solution to dynamic stabbing, i.e., the problem 
of maintaining a dynamic set of intervals where the query is to find an interval containing a 
given point. This codes problems like method look-up in object oriented programming and 
IP classification for firewalls on the internet. The solution has query time O(k), update time 
0(n 1/,fc ), and uses linear space. Previous solutions used space 0(n l+l l k ). The solution does 
not involve any search structure, so it is important that Theorem |S] has a general format not 
specialized to search applications. 

8 An open problem 

It is an interesting open problem what is the right complexity for searching with stan- 
dard, or even non-standard, AC operations? Andersson et.al. [3], have shown that even 
if we allow non-standard AC operations, the exact complexity of membership queries is 
( -y/log / log log n) . This contrast the situation at the RAM, where we can get down to con- 
stant time for membership queries. Interestingly, (^/log / log \ogn) is also the RAM lower 
bound for searching, so the question is potentially, it is possible to do the (-\/log / log \ogn) 
searching using AC operations only. 
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